Elena Tuzhilina
Sep 28, 2021
http://web.stanford.edu/~elenatuz/courses/stats32-aut2021/
classes <- list(quarter = "Fall 2018/19",
ID = c("STATS 32", "STATS 101", "STATS 200"),
credits = 12)
classes$ID## [1] "STATS 32" "STATS 101" "STATS 200"
## [1] 12
A special type of list:
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
str, summaryhead, tailnames, dim, nrow, ncoltablemean, median, sd, varfactorggplot2 (and the + syntax)“The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey
## mpg weight cylinders
## Mazda RX4 21.0 2.620 6
## Mazda RX4 Wag 21.0 2.875 6
## Datsun 710 22.8 2.320 4
## Hornet 4 Drive 21.4 3.215 6
## Hornet Sportabout 18.7 3.440 8
## Valiant 18.1 3.460 6
## Duster 360 14.3 3.570 8
## Merc 240D 24.4 3.190 4
## Merc 230 22.8 3.150 4
## Merc 280 19.2 3.440 6
## Merc 280C 17.8 3.440 6
## Merc 450SE 16.4 4.070 8
## Merc 450SL 17.3 3.730 8
## Merc 450SLC 15.2 3.780 8
## Cadillac Fleetwood 10.4 5.250 8
## Lincoln Continental 10.4 5.424 8
## Chrysler Imperial 14.7 5.345 8
## Fiat 128 32.4 2.200 4
## Honda Civic 30.4 1.615 4
## Toyota Corolla 33.9 1.835 4
## Toyota Corona 21.5 2.465 4
## Dodge Challenger 15.5 3.520 8
## AMC Javelin 15.2 3.435 8
## Camaro Z28 13.3 3.840 8
## Pontiac Firebird 19.2 3.845 8
## Fiat X1-9 27.3 1.935 4
## Porsche 914-2 26.0 2.140 4
## Lotus Europa 30.4 1.513 4
## Ford Pantera L 15.8 3.170 8
## Ferrari Dino 19.7 2.770 6
## Maserati Bora 15.0 3.570 8
## Volvo 142E 21.4 2.780 4
“The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey
## mpg weight cylinders
## Mazda RX4 21.0 2.620 6
## Mazda RX4 Wag 21.0 2.875 6
## Datsun 710 22.8 2.320 4
## Hornet 4 Drive 21.4 3.215 6
## Hornet Sportabout 18.7 3.440 8
## Valiant 18.1 3.460 6
## Duster 360 14.3 3.570 8
## Merc 240D 24.4 3.190 4
## Merc 230 22.8 3.150 4
## Merc 280 19.2 3.440 6
## Merc 280C 17.8 3.440 6
## Merc 450SE 16.4 4.070 8
## Merc 450SL 17.3 3.730 8
## Merc 450SLC 15.2 3.780 8
## Cadillac Fleetwood 10.4 5.250 8
## Lincoln Continental 10.4 5.424 8
## Chrysler Imperial 14.7 5.345 8
## Fiat 128 32.4 2.200 4
## Honda Civic 30.4 1.615 4
## Toyota Corolla 33.9 1.835 4
## Toyota Corona 21.5 2.465 4
## Dodge Challenger 15.5 3.520 8
## AMC Javelin 15.2 3.435 8
## Camaro Z28 13.3 3.840 8
## Pontiac Firebird 19.2 3.845 8
## Fiat X1-9 27.3 1.935 4
## Porsche 914-2 26.0 2.140 4
## Lotus Europa 30.4 1.513 4
## Ford Pantera L 15.8 3.170 8
## Ferrari Dino 19.7 2.770 6
## Maserati Bora 15.0 3.570 8
## Volvo 142E 21.4 2.780 4
What is the distribution of cylinders in my dataset?
What is the distribution of miles per gallon in my dataset?
What is the relationship between mpg and weight?
What is the relationship between mpg and time?
Not so good…
Easier to see the trend
For each value of cylinder, what is the distribution of mpg like?
How often does each pair of cylinder and gear occur in the dataset?
I have father-son pairs. For each pair, I record their height and weight, as well as their ethnicities. I want to study the relationship between characteristics of the father and that of the son. What plots could help me?
ggplot2ggplot2 packageggplot2 reference manualData: Dataset we are using for the plot
## mpg weight cylinders
## Mazda RX4 21.0 2.620 6
## Mazda RX4 Wag 21.0 2.875 6
## Datsun 710 22.8 2.320 4
## Hornet 4 Drive 21.4 3.215 6
## Hornet Sportabout 18.7 3.440 8
## Valiant 18.1 3.460 6
## Duster 360 14.3 3.570 8
## Merc 240D 24.4 3.190 4
## Merc 230 22.8 3.150 4
## Merc 280 19.2 3.440 6
Geometries: Visual elements used for our data
Geom: point
Aesthetics: Defines the data columns which affect various aspects of the geom
3 different aesthetics:
ggplot2 code
Optional material
One graphic contains:
Sometimes we need to tweak the position of the geometric elements because they obscure each other.
Only 9 data points??
Much better
Default colors
Manually chosen colors
rgb(0,0,1), rgb(1,0,0), rgb(0,0,0), rgb(1,1,1)